What is the most efficient way to compute the difference of lines from two files?
I have two lists in python list_a
and list_b
. The list_a
have some images links, and the list_b
too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a
, that means all items in list_b
are in list_a
. My initial idea is subtract all items:
list_a - list_b = list_c
, where the list_c
are my surplus items. My code is:
list_a =
list_b =
list_c =
arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)
arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)
arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)
I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022
items in my list_b.txt
and 78.616.777
in my list list_a.txt
. I dont't know the cost of this expression: if a not in arq_b
. But if i execute this code, i think wont finish in this year.
My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.
My question is, there's another way to make this operation more efficiently(Faster)?
- The
list_a
is ordinate but thelist_b
not. - Each item have this size:
images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png
- The order doesnt matter, i want know the surplus.
python python-3.x list performance
|
show 1 more comment
I have two lists in python list_a
and list_b
. The list_a
have some images links, and the list_b
too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a
, that means all items in list_b
are in list_a
. My initial idea is subtract all items:
list_a - list_b = list_c
, where the list_c
are my surplus items. My code is:
list_a =
list_b =
list_c =
arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)
arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)
arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)
I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022
items in my list_b.txt
and 78.616.777
in my list list_a.txt
. I dont't know the cost of this expression: if a not in arq_b
. But if i execute this code, i think wont finish in this year.
My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.
My question is, there's another way to make this operation more efficiently(Faster)?
- The
list_a
is ordinate but thelist_b
not. - Each item have this size:
images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png
- The order doesnt matter, i want know the surplus.
python python-3.x list performance
5
Does the order matter? If not, try using sets. With sets, subtraction should be linear:set_c = set_a - set_b
.
– L3viathan
Jan 10 at 12:37
But is possible make this in python?
– Vinicius Morais
Jan 10 at 12:39
The python will use the most efficient way to make this operation?
– Vinicius Morais
Jan 10 at 12:40
1
Yes, I mean the Python datatypeset
.
– L3viathan
Jan 10 at 12:40
1
@tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.
– SpoonMeiser
Jan 10 at 12:48
|
show 1 more comment
I have two lists in python list_a
and list_b
. The list_a
have some images links, and the list_b
too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a
, that means all items in list_b
are in list_a
. My initial idea is subtract all items:
list_a - list_b = list_c
, where the list_c
are my surplus items. My code is:
list_a =
list_b =
list_c =
arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)
arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)
arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)
I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022
items in my list_b.txt
and 78.616.777
in my list list_a.txt
. I dont't know the cost of this expression: if a not in arq_b
. But if i execute this code, i think wont finish in this year.
My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.
My question is, there's another way to make this operation more efficiently(Faster)?
- The
list_a
is ordinate but thelist_b
not. - Each item have this size:
images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png
- The order doesnt matter, i want know the surplus.
python python-3.x list performance
I have two lists in python list_a
and list_b
. The list_a
have some images links, and the list_b
too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a
, that means all items in list_b
are in list_a
. My initial idea is subtract all items:
list_a - list_b = list_c
, where the list_c
are my surplus items. My code is:
list_a =
list_b =
list_c =
arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)
arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)
arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)
I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022
items in my list_b.txt
and 78.616.777
in my list list_a.txt
. I dont't know the cost of this expression: if a not in arq_b
. But if i execute this code, i think wont finish in this year.
My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.
My question is, there's another way to make this operation more efficiently(Faster)?
- The
list_a
is ordinate but thelist_b
not. - Each item have this size:
images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png
- The order doesnt matter, i want know the surplus.
python python-3.x list performance
python python-3.x list performance
edited Jan 10 at 20:14
Jean-François Fabre
102k954111
102k954111
asked Jan 10 at 12:36
Vinicius MoraisVinicius Morais
19710
19710
5
Does the order matter? If not, try using sets. With sets, subtraction should be linear:set_c = set_a - set_b
.
– L3viathan
Jan 10 at 12:37
But is possible make this in python?
– Vinicius Morais
Jan 10 at 12:39
The python will use the most efficient way to make this operation?
– Vinicius Morais
Jan 10 at 12:40
1
Yes, I mean the Python datatypeset
.
– L3viathan
Jan 10 at 12:40
1
@tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.
– SpoonMeiser
Jan 10 at 12:48
|
show 1 more comment
5
Does the order matter? If not, try using sets. With sets, subtraction should be linear:set_c = set_a - set_b
.
– L3viathan
Jan 10 at 12:37
But is possible make this in python?
– Vinicius Morais
Jan 10 at 12:39
The python will use the most efficient way to make this operation?
– Vinicius Morais
Jan 10 at 12:40
1
Yes, I mean the Python datatypeset
.
– L3viathan
Jan 10 at 12:40
1
@tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.
– SpoonMeiser
Jan 10 at 12:48
5
5
Does the order matter? If not, try using sets. With sets, subtraction should be linear:
set_c = set_a - set_b
.– L3viathan
Jan 10 at 12:37
Does the order matter? If not, try using sets. With sets, subtraction should be linear:
set_c = set_a - set_b
.– L3viathan
Jan 10 at 12:37
But is possible make this in python?
– Vinicius Morais
Jan 10 at 12:39
But is possible make this in python?
– Vinicius Morais
Jan 10 at 12:39
The python will use the most efficient way to make this operation?
– Vinicius Morais
Jan 10 at 12:40
The python will use the most efficient way to make this operation?
– Vinicius Morais
Jan 10 at 12:40
1
1
Yes, I mean the Python datatype
set
.– L3viathan
Jan 10 at 12:40
Yes, I mean the Python datatype
set
.– L3viathan
Jan 10 at 12:40
1
1
@tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.
– SpoonMeiser
Jan 10 at 12:48
@tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.
– SpoonMeiser
Jan 10 at 12:48
|
show 1 more comment
4 Answers
4
active
oldest
votes
you can create one set of the first file contents, then just use difference
or symmetric_difference
depending on what you call a difference
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
diffs = set_a.difference(f)
if list_b.txt
contains more items than list_a.txt
you want to swap them or use set_a.symmetric_difference(f)
instead, depending on what you need.
difference(f)
works but still has to construct a new set
internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.
Nice, this avoids having to allocate space for the second set.
– L3viathan
Jan 10 at 12:54
1
Well, not really, because internally aset
is created, then thrown away. but it's thrown away faster
– Jean-François Fabre
Jan 10 at 12:54
But the complexity is the same of subtract sets?
– Vinicius Morais
Jan 10 at 13:00
@ViniciusMorais The time complexity is the same, the space complexity (apparently), too.
– L3viathan
Jan 10 at 13:45
1
@L3viathan In case the original list (the original set) is not needed anymore you can usedifference_update
. This should not require to allocate a new set internally.
– a_guest
Jan 10 at 14:33
add a comment |
Try using sets:
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
set_b = set(f)
set_c = set_a - set_b
with open("list_c.txt","w") as f:
for c in set_c:
f.write(c)
The complexity of subtracting two sets is O(n) in the size of the set a.
2
You know - an open file is an iterator - therefore you can simply doset_a = set(open("list_a.txt"))
– jsbueno
Jan 10 at 12:47
11
yes but doingset(f)
in with block ensures that it closes the file
– Jean-François Fabre
Jan 10 at 12:50
add a comment |
To extend the comment of @L3viathan
If order of element is not important set is the rigth way.
here a dummy example you can adapt:
l1 = [0,1,2,3,4,5]
l2 = [3,4,5]
setL1 = set(l1) # transform the list into a set
setL2 = set(l2)
setDiff = setl1 - setl2 # make the difference
listeDiff = list(setDiff) # if you want to have your element back in a list
as you see is pretty straightforward in python.
add a comment |
In case order matters you can presort the lists together with item indices and then iterate over them together:
list_2 = sorted(list_2)
diff_idx =
j = 0
for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
if x != list_2[j]:
diff_idx.append(i)
else:
j += 1
diff = [list_1[i] for i in sorted(diff_idx)]
This has time complexity of the sorting algorithm, i.e. O(n*log n).
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54128876%2fwhat-is-the-most-efficient-way-to-compute-the-difference-of-lines-from-two-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
you can create one set of the first file contents, then just use difference
or symmetric_difference
depending on what you call a difference
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
diffs = set_a.difference(f)
if list_b.txt
contains more items than list_a.txt
you want to swap them or use set_a.symmetric_difference(f)
instead, depending on what you need.
difference(f)
works but still has to construct a new set
internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.
Nice, this avoids having to allocate space for the second set.
– L3viathan
Jan 10 at 12:54
1
Well, not really, because internally aset
is created, then thrown away. but it's thrown away faster
– Jean-François Fabre
Jan 10 at 12:54
But the complexity is the same of subtract sets?
– Vinicius Morais
Jan 10 at 13:00
@ViniciusMorais The time complexity is the same, the space complexity (apparently), too.
– L3viathan
Jan 10 at 13:45
1
@L3viathan In case the original list (the original set) is not needed anymore you can usedifference_update
. This should not require to allocate a new set internally.
– a_guest
Jan 10 at 14:33
add a comment |
you can create one set of the first file contents, then just use difference
or symmetric_difference
depending on what you call a difference
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
diffs = set_a.difference(f)
if list_b.txt
contains more items than list_a.txt
you want to swap them or use set_a.symmetric_difference(f)
instead, depending on what you need.
difference(f)
works but still has to construct a new set
internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.
Nice, this avoids having to allocate space for the second set.
– L3viathan
Jan 10 at 12:54
1
Well, not really, because internally aset
is created, then thrown away. but it's thrown away faster
– Jean-François Fabre
Jan 10 at 12:54
But the complexity is the same of subtract sets?
– Vinicius Morais
Jan 10 at 13:00
@ViniciusMorais The time complexity is the same, the space complexity (apparently), too.
– L3viathan
Jan 10 at 13:45
1
@L3viathan In case the original list (the original set) is not needed anymore you can usedifference_update
. This should not require to allocate a new set internally.
– a_guest
Jan 10 at 14:33
add a comment |
you can create one set of the first file contents, then just use difference
or symmetric_difference
depending on what you call a difference
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
diffs = set_a.difference(f)
if list_b.txt
contains more items than list_a.txt
you want to swap them or use set_a.symmetric_difference(f)
instead, depending on what you need.
difference(f)
works but still has to construct a new set
internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.
you can create one set of the first file contents, then just use difference
or symmetric_difference
depending on what you call a difference
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
diffs = set_a.difference(f)
if list_b.txt
contains more items than list_a.txt
you want to swap them or use set_a.symmetric_difference(f)
instead, depending on what you need.
difference(f)
works but still has to construct a new set
internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.
edited Jan 10 at 12:54
answered Jan 10 at 12:52
Jean-François FabreJean-François Fabre
102k954111
102k954111
Nice, this avoids having to allocate space for the second set.
– L3viathan
Jan 10 at 12:54
1
Well, not really, because internally aset
is created, then thrown away. but it's thrown away faster
– Jean-François Fabre
Jan 10 at 12:54
But the complexity is the same of subtract sets?
– Vinicius Morais
Jan 10 at 13:00
@ViniciusMorais The time complexity is the same, the space complexity (apparently), too.
– L3viathan
Jan 10 at 13:45
1
@L3viathan In case the original list (the original set) is not needed anymore you can usedifference_update
. This should not require to allocate a new set internally.
– a_guest
Jan 10 at 14:33
add a comment |
Nice, this avoids having to allocate space for the second set.
– L3viathan
Jan 10 at 12:54
1
Well, not really, because internally aset
is created, then thrown away. but it's thrown away faster
– Jean-François Fabre
Jan 10 at 12:54
But the complexity is the same of subtract sets?
– Vinicius Morais
Jan 10 at 13:00
@ViniciusMorais The time complexity is the same, the space complexity (apparently), too.
– L3viathan
Jan 10 at 13:45
1
@L3viathan In case the original list (the original set) is not needed anymore you can usedifference_update
. This should not require to allocate a new set internally.
– a_guest
Jan 10 at 14:33
Nice, this avoids having to allocate space for the second set.
– L3viathan
Jan 10 at 12:54
Nice, this avoids having to allocate space for the second set.
– L3viathan
Jan 10 at 12:54
1
1
Well, not really, because internally a
set
is created, then thrown away. but it's thrown away faster– Jean-François Fabre
Jan 10 at 12:54
Well, not really, because internally a
set
is created, then thrown away. but it's thrown away faster– Jean-François Fabre
Jan 10 at 12:54
But the complexity is the same of subtract sets?
– Vinicius Morais
Jan 10 at 13:00
But the complexity is the same of subtract sets?
– Vinicius Morais
Jan 10 at 13:00
@ViniciusMorais The time complexity is the same, the space complexity (apparently), too.
– L3viathan
Jan 10 at 13:45
@ViniciusMorais The time complexity is the same, the space complexity (apparently), too.
– L3viathan
Jan 10 at 13:45
1
1
@L3viathan In case the original list (the original set) is not needed anymore you can use
difference_update
. This should not require to allocate a new set internally.– a_guest
Jan 10 at 14:33
@L3viathan In case the original list (the original set) is not needed anymore you can use
difference_update
. This should not require to allocate a new set internally.– a_guest
Jan 10 at 14:33
add a comment |
Try using sets:
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
set_b = set(f)
set_c = set_a - set_b
with open("list_c.txt","w") as f:
for c in set_c:
f.write(c)
The complexity of subtracting two sets is O(n) in the size of the set a.
2
You know - an open file is an iterator - therefore you can simply doset_a = set(open("list_a.txt"))
– jsbueno
Jan 10 at 12:47
11
yes but doingset(f)
in with block ensures that it closes the file
– Jean-François Fabre
Jan 10 at 12:50
add a comment |
Try using sets:
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
set_b = set(f)
set_c = set_a - set_b
with open("list_c.txt","w") as f:
for c in set_c:
f.write(c)
The complexity of subtracting two sets is O(n) in the size of the set a.
2
You know - an open file is an iterator - therefore you can simply doset_a = set(open("list_a.txt"))
– jsbueno
Jan 10 at 12:47
11
yes but doingset(f)
in with block ensures that it closes the file
– Jean-François Fabre
Jan 10 at 12:50
add a comment |
Try using sets:
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
set_b = set(f)
set_c = set_a - set_b
with open("list_c.txt","w") as f:
for c in set_c:
f.write(c)
The complexity of subtracting two sets is O(n) in the size of the set a.
Try using sets:
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
set_b = set(f)
set_c = set_a - set_b
with open("list_c.txt","w") as f:
for c in set_c:
f.write(c)
The complexity of subtracting two sets is O(n) in the size of the set a.
edited Jan 10 at 12:51
answered Jan 10 at 12:43
L3viathanL3viathan
15.9k12847
15.9k12847
2
You know - an open file is an iterator - therefore you can simply doset_a = set(open("list_a.txt"))
– jsbueno
Jan 10 at 12:47
11
yes but doingset(f)
in with block ensures that it closes the file
– Jean-François Fabre
Jan 10 at 12:50
add a comment |
2
You know - an open file is an iterator - therefore you can simply doset_a = set(open("list_a.txt"))
– jsbueno
Jan 10 at 12:47
11
yes but doingset(f)
in with block ensures that it closes the file
– Jean-François Fabre
Jan 10 at 12:50
2
2
You know - an open file is an iterator - therefore you can simply do
set_a = set(open("list_a.txt"))
– jsbueno
Jan 10 at 12:47
You know - an open file is an iterator - therefore you can simply do
set_a = set(open("list_a.txt"))
– jsbueno
Jan 10 at 12:47
11
11
yes but doing
set(f)
in with block ensures that it closes the file– Jean-François Fabre
Jan 10 at 12:50
yes but doing
set(f)
in with block ensures that it closes the file– Jean-François Fabre
Jan 10 at 12:50
add a comment |
To extend the comment of @L3viathan
If order of element is not important set is the rigth way.
here a dummy example you can adapt:
l1 = [0,1,2,3,4,5]
l2 = [3,4,5]
setL1 = set(l1) # transform the list into a set
setL2 = set(l2)
setDiff = setl1 - setl2 # make the difference
listeDiff = list(setDiff) # if you want to have your element back in a list
as you see is pretty straightforward in python.
add a comment |
To extend the comment of @L3viathan
If order of element is not important set is the rigth way.
here a dummy example you can adapt:
l1 = [0,1,2,3,4,5]
l2 = [3,4,5]
setL1 = set(l1) # transform the list into a set
setL2 = set(l2)
setDiff = setl1 - setl2 # make the difference
listeDiff = list(setDiff) # if you want to have your element back in a list
as you see is pretty straightforward in python.
add a comment |
To extend the comment of @L3viathan
If order of element is not important set is the rigth way.
here a dummy example you can adapt:
l1 = [0,1,2,3,4,5]
l2 = [3,4,5]
setL1 = set(l1) # transform the list into a set
setL2 = set(l2)
setDiff = setl1 - setl2 # make the difference
listeDiff = list(setDiff) # if you want to have your element back in a list
as you see is pretty straightforward in python.
To extend the comment of @L3viathan
If order of element is not important set is the rigth way.
here a dummy example you can adapt:
l1 = [0,1,2,3,4,5]
l2 = [3,4,5]
setL1 = set(l1) # transform the list into a set
setL2 = set(l2)
setDiff = setl1 - setl2 # make the difference
listeDiff = list(setDiff) # if you want to have your element back in a list
as you see is pretty straightforward in python.
answered Jan 10 at 12:44
RomainL.RomainL.
308313
308313
add a comment |
add a comment |
In case order matters you can presort the lists together with item indices and then iterate over them together:
list_2 = sorted(list_2)
diff_idx =
j = 0
for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
if x != list_2[j]:
diff_idx.append(i)
else:
j += 1
diff = [list_1[i] for i in sorted(diff_idx)]
This has time complexity of the sorting algorithm, i.e. O(n*log n).
add a comment |
In case order matters you can presort the lists together with item indices and then iterate over them together:
list_2 = sorted(list_2)
diff_idx =
j = 0
for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
if x != list_2[j]:
diff_idx.append(i)
else:
j += 1
diff = [list_1[i] for i in sorted(diff_idx)]
This has time complexity of the sorting algorithm, i.e. O(n*log n).
add a comment |
In case order matters you can presort the lists together with item indices and then iterate over them together:
list_2 = sorted(list_2)
diff_idx =
j = 0
for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
if x != list_2[j]:
diff_idx.append(i)
else:
j += 1
diff = [list_1[i] for i in sorted(diff_idx)]
This has time complexity of the sorting algorithm, i.e. O(n*log n).
In case order matters you can presort the lists together with item indices and then iterate over them together:
list_2 = sorted(list_2)
diff_idx =
j = 0
for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
if x != list_2[j]:
diff_idx.append(i)
else:
j += 1
diff = [list_1[i] for i in sorted(diff_idx)]
This has time complexity of the sorting algorithm, i.e. O(n*log n).
edited Jan 10 at 13:04
answered Jan 10 at 12:57
a_guesta_guest
5,87321241
5,87321241
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54128876%2fwhat-is-the-most-efficient-way-to-compute-the-difference-of-lines-from-two-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
5
Does the order matter? If not, try using sets. With sets, subtraction should be linear:
set_c = set_a - set_b
.– L3viathan
Jan 10 at 12:37
But is possible make this in python?
– Vinicius Morais
Jan 10 at 12:39
The python will use the most efficient way to make this operation?
– Vinicius Morais
Jan 10 at 12:40
1
Yes, I mean the Python datatype
set
.– L3viathan
Jan 10 at 12:40
1
@tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.
– SpoonMeiser
Jan 10 at 12:48