What is the most efficient way to compute the difference of lines from two files?












17















I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items:
list_a - list_b = list_c, where the list_c are my surplus items. My code is:



list_a = 
list_b =
list_c =

arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)

arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)

arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)


I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.



My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.



My question is, there's another way to make this operation more efficiently(Faster)?




  • The list_a is ordinate but the list_b not.

  • Each item have this size: images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png

  • The order doesnt matter, i want know the surplus.










share|improve this question




















  • 5





    Does the order matter? If not, try using sets. With sets, subtraction should be linear: set_c = set_a - set_b.

    – L3viathan
    Jan 10 at 12:37













  • But is possible make this in python?

    – Vinicius Morais
    Jan 10 at 12:39











  • The python will use the most efficient way to make this operation?

    – Vinicius Morais
    Jan 10 at 12:40






  • 1





    Yes, I mean the Python datatype set.

    – L3viathan
    Jan 10 at 12:40








  • 1





    @tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.

    – SpoonMeiser
    Jan 10 at 12:48
















17















I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items:
list_a - list_b = list_c, where the list_c are my surplus items. My code is:



list_a = 
list_b =
list_c =

arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)

arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)

arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)


I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.



My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.



My question is, there's another way to make this operation more efficiently(Faster)?




  • The list_a is ordinate but the list_b not.

  • Each item have this size: images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png

  • The order doesnt matter, i want know the surplus.










share|improve this question




















  • 5





    Does the order matter? If not, try using sets. With sets, subtraction should be linear: set_c = set_a - set_b.

    – L3viathan
    Jan 10 at 12:37













  • But is possible make this in python?

    – Vinicius Morais
    Jan 10 at 12:39











  • The python will use the most efficient way to make this operation?

    – Vinicius Morais
    Jan 10 at 12:40






  • 1





    Yes, I mean the Python datatype set.

    – L3viathan
    Jan 10 at 12:40








  • 1





    @tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.

    – SpoonMeiser
    Jan 10 at 12:48














17












17








17


3






I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items:
list_a - list_b = list_c, where the list_c are my surplus items. My code is:



list_a = 
list_b =
list_c =

arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)

arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)

arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)


I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.



My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.



My question is, there's another way to make this operation more efficiently(Faster)?




  • The list_a is ordinate but the list_b not.

  • Each item have this size: images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png

  • The order doesnt matter, i want know the surplus.










share|improve this question
















I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items:
list_a - list_b = list_c, where the list_c are my surplus items. My code is:



list_a = 
list_b =
list_c =

arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)

arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)

arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)


I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.



My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.



My question is, there's another way to make this operation more efficiently(Faster)?




  • The list_a is ordinate but the list_b not.

  • Each item have this size: images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png

  • The order doesnt matter, i want know the surplus.







python python-3.x list performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 10 at 20:14









Jean-François Fabre

102k954111




102k954111










asked Jan 10 at 12:36









Vinicius MoraisVinicius Morais

19710




19710








  • 5





    Does the order matter? If not, try using sets. With sets, subtraction should be linear: set_c = set_a - set_b.

    – L3viathan
    Jan 10 at 12:37













  • But is possible make this in python?

    – Vinicius Morais
    Jan 10 at 12:39











  • The python will use the most efficient way to make this operation?

    – Vinicius Morais
    Jan 10 at 12:40






  • 1





    Yes, I mean the Python datatype set.

    – L3viathan
    Jan 10 at 12:40








  • 1





    @tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.

    – SpoonMeiser
    Jan 10 at 12:48














  • 5





    Does the order matter? If not, try using sets. With sets, subtraction should be linear: set_c = set_a - set_b.

    – L3viathan
    Jan 10 at 12:37













  • But is possible make this in python?

    – Vinicius Morais
    Jan 10 at 12:39











  • The python will use the most efficient way to make this operation?

    – Vinicius Morais
    Jan 10 at 12:40






  • 1





    Yes, I mean the Python datatype set.

    – L3viathan
    Jan 10 at 12:40








  • 1





    @tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.

    – SpoonMeiser
    Jan 10 at 12:48








5




5





Does the order matter? If not, try using sets. With sets, subtraction should be linear: set_c = set_a - set_b.

– L3viathan
Jan 10 at 12:37







Does the order matter? If not, try using sets. With sets, subtraction should be linear: set_c = set_a - set_b.

– L3viathan
Jan 10 at 12:37















But is possible make this in python?

– Vinicius Morais
Jan 10 at 12:39





But is possible make this in python?

– Vinicius Morais
Jan 10 at 12:39













The python will use the most efficient way to make this operation?

– Vinicius Morais
Jan 10 at 12:40





The python will use the most efficient way to make this operation?

– Vinicius Morais
Jan 10 at 12:40




1




1





Yes, I mean the Python datatype set.

– L3viathan
Jan 10 at 12:40







Yes, I mean the Python datatype set.

– L3viathan
Jan 10 at 12:40






1




1





@tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.

– SpoonMeiser
Jan 10 at 12:48





@tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.

– SpoonMeiser
Jan 10 at 12:48












4 Answers
4






active

oldest

votes


















12














you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference



with open("list_a.txt") as f:
set_a = set(f)

with open("list_b.txt") as f:
diffs = set_a.difference(f)


if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.



difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.






share|improve this answer


























  • Nice, this avoids having to allocate space for the second set.

    – L3viathan
    Jan 10 at 12:54






  • 1





    Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

    – Jean-François Fabre
    Jan 10 at 12:54













  • But the complexity is the same of subtract sets?

    – Vinicius Morais
    Jan 10 at 13:00











  • @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

    – L3viathan
    Jan 10 at 13:45






  • 1





    @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

    – a_guest
    Jan 10 at 14:33





















9














Try using sets:



with open("list_a.txt") as f:
set_a = set(f)

with open("list_b.txt") as f:
set_b = set(f)

set_c = set_a - set_b

with open("list_c.txt","w") as f:
for c in set_c:
f.write(c)


The complexity of subtracting two sets is O(n) in the size of the set a.






share|improve this answer





















  • 2





    You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

    – jsbueno
    Jan 10 at 12:47






  • 11





    yes but doing set(f) in with block ensures that it closes the file

    – Jean-François Fabre
    Jan 10 at 12:50



















2














To extend the comment of @L3viathan
If order of element is not important set is the rigth way.
here a dummy example you can adapt:



l1 = [0,1,2,3,4,5]
l2 = [3,4,5]
setL1 = set(l1) # transform the list into a set
setL2 = set(l2)
setDiff = setl1 - setl2 # make the difference
listeDiff = list(setDiff) # if you want to have your element back in a list


as you see is pretty straightforward in python.






share|improve this answer































    2














    In case order matters you can presort the lists together with item indices and then iterate over them together:



    list_2 = sorted(list_2)
    diff_idx =
    j = 0
    for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
    if x != list_2[j]:
    diff_idx.append(i)
    else:
    j += 1
    diff = [list_1[i] for i in sorted(diff_idx)]


    This has time complexity of the sorting algorithm, i.e. O(n*log n).






    share|improve this answer

























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54128876%2fwhat-is-the-most-efficient-way-to-compute-the-difference-of-lines-from-two-files%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      12














      you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      diffs = set_a.difference(f)


      if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.



      difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.






      share|improve this answer


























      • Nice, this avoids having to allocate space for the second set.

        – L3viathan
        Jan 10 at 12:54






      • 1





        Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

        – Jean-François Fabre
        Jan 10 at 12:54













      • But the complexity is the same of subtract sets?

        – Vinicius Morais
        Jan 10 at 13:00











      • @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

        – L3viathan
        Jan 10 at 13:45






      • 1





        @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

        – a_guest
        Jan 10 at 14:33


















      12














      you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      diffs = set_a.difference(f)


      if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.



      difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.






      share|improve this answer


























      • Nice, this avoids having to allocate space for the second set.

        – L3viathan
        Jan 10 at 12:54






      • 1





        Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

        – Jean-François Fabre
        Jan 10 at 12:54













      • But the complexity is the same of subtract sets?

        – Vinicius Morais
        Jan 10 at 13:00











      • @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

        – L3viathan
        Jan 10 at 13:45






      • 1





        @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

        – a_guest
        Jan 10 at 14:33
















      12












      12








      12







      you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      diffs = set_a.difference(f)


      if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.



      difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.






      share|improve this answer















      you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      diffs = set_a.difference(f)


      if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.



      difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jan 10 at 12:54

























      answered Jan 10 at 12:52









      Jean-François FabreJean-François Fabre

      102k954111




      102k954111













      • Nice, this avoids having to allocate space for the second set.

        – L3viathan
        Jan 10 at 12:54






      • 1





        Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

        – Jean-François Fabre
        Jan 10 at 12:54













      • But the complexity is the same of subtract sets?

        – Vinicius Morais
        Jan 10 at 13:00











      • @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

        – L3viathan
        Jan 10 at 13:45






      • 1





        @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

        – a_guest
        Jan 10 at 14:33





















      • Nice, this avoids having to allocate space for the second set.

        – L3viathan
        Jan 10 at 12:54






      • 1





        Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

        – Jean-François Fabre
        Jan 10 at 12:54













      • But the complexity is the same of subtract sets?

        – Vinicius Morais
        Jan 10 at 13:00











      • @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

        – L3viathan
        Jan 10 at 13:45






      • 1





        @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

        – a_guest
        Jan 10 at 14:33



















      Nice, this avoids having to allocate space for the second set.

      – L3viathan
      Jan 10 at 12:54





      Nice, this avoids having to allocate space for the second set.

      – L3viathan
      Jan 10 at 12:54




      1




      1





      Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

      – Jean-François Fabre
      Jan 10 at 12:54







      Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

      – Jean-François Fabre
      Jan 10 at 12:54















      But the complexity is the same of subtract sets?

      – Vinicius Morais
      Jan 10 at 13:00





      But the complexity is the same of subtract sets?

      – Vinicius Morais
      Jan 10 at 13:00













      @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

      – L3viathan
      Jan 10 at 13:45





      @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

      – L3viathan
      Jan 10 at 13:45




      1




      1





      @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

      – a_guest
      Jan 10 at 14:33







      @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

      – a_guest
      Jan 10 at 14:33















      9














      Try using sets:



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      set_b = set(f)

      set_c = set_a - set_b

      with open("list_c.txt","w") as f:
      for c in set_c:
      f.write(c)


      The complexity of subtracting two sets is O(n) in the size of the set a.






      share|improve this answer





















      • 2





        You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

        – jsbueno
        Jan 10 at 12:47






      • 11





        yes but doing set(f) in with block ensures that it closes the file

        – Jean-François Fabre
        Jan 10 at 12:50
















      9














      Try using sets:



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      set_b = set(f)

      set_c = set_a - set_b

      with open("list_c.txt","w") as f:
      for c in set_c:
      f.write(c)


      The complexity of subtracting two sets is O(n) in the size of the set a.






      share|improve this answer





















      • 2





        You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

        – jsbueno
        Jan 10 at 12:47






      • 11





        yes but doing set(f) in with block ensures that it closes the file

        – Jean-François Fabre
        Jan 10 at 12:50














      9












      9








      9







      Try using sets:



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      set_b = set(f)

      set_c = set_a - set_b

      with open("list_c.txt","w") as f:
      for c in set_c:
      f.write(c)


      The complexity of subtracting two sets is O(n) in the size of the set a.






      share|improve this answer















      Try using sets:



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      set_b = set(f)

      set_c = set_a - set_b

      with open("list_c.txt","w") as f:
      for c in set_c:
      f.write(c)


      The complexity of subtracting two sets is O(n) in the size of the set a.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jan 10 at 12:51

























      answered Jan 10 at 12:43









      L3viathanL3viathan

      15.9k12847




      15.9k12847








      • 2





        You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

        – jsbueno
        Jan 10 at 12:47






      • 11





        yes but doing set(f) in with block ensures that it closes the file

        – Jean-François Fabre
        Jan 10 at 12:50














      • 2





        You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

        – jsbueno
        Jan 10 at 12:47






      • 11





        yes but doing set(f) in with block ensures that it closes the file

        – Jean-François Fabre
        Jan 10 at 12:50








      2




      2





      You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

      – jsbueno
      Jan 10 at 12:47





      You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

      – jsbueno
      Jan 10 at 12:47




      11




      11





      yes but doing set(f) in with block ensures that it closes the file

      – Jean-François Fabre
      Jan 10 at 12:50





      yes but doing set(f) in with block ensures that it closes the file

      – Jean-François Fabre
      Jan 10 at 12:50











      2














      To extend the comment of @L3viathan
      If order of element is not important set is the rigth way.
      here a dummy example you can adapt:



      l1 = [0,1,2,3,4,5]
      l2 = [3,4,5]
      setL1 = set(l1) # transform the list into a set
      setL2 = set(l2)
      setDiff = setl1 - setl2 # make the difference
      listeDiff = list(setDiff) # if you want to have your element back in a list


      as you see is pretty straightforward in python.






      share|improve this answer




























        2














        To extend the comment of @L3viathan
        If order of element is not important set is the rigth way.
        here a dummy example you can adapt:



        l1 = [0,1,2,3,4,5]
        l2 = [3,4,5]
        setL1 = set(l1) # transform the list into a set
        setL2 = set(l2)
        setDiff = setl1 - setl2 # make the difference
        listeDiff = list(setDiff) # if you want to have your element back in a list


        as you see is pretty straightforward in python.






        share|improve this answer


























          2












          2








          2







          To extend the comment of @L3viathan
          If order of element is not important set is the rigth way.
          here a dummy example you can adapt:



          l1 = [0,1,2,3,4,5]
          l2 = [3,4,5]
          setL1 = set(l1) # transform the list into a set
          setL2 = set(l2)
          setDiff = setl1 - setl2 # make the difference
          listeDiff = list(setDiff) # if you want to have your element back in a list


          as you see is pretty straightforward in python.






          share|improve this answer













          To extend the comment of @L3viathan
          If order of element is not important set is the rigth way.
          here a dummy example you can adapt:



          l1 = [0,1,2,3,4,5]
          l2 = [3,4,5]
          setL1 = set(l1) # transform the list into a set
          setL2 = set(l2)
          setDiff = setl1 - setl2 # make the difference
          listeDiff = list(setDiff) # if you want to have your element back in a list


          as you see is pretty straightforward in python.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 10 at 12:44









          RomainL.RomainL.

          308313




          308313























              2














              In case order matters you can presort the lists together with item indices and then iterate over them together:



              list_2 = sorted(list_2)
              diff_idx =
              j = 0
              for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
              if x != list_2[j]:
              diff_idx.append(i)
              else:
              j += 1
              diff = [list_1[i] for i in sorted(diff_idx)]


              This has time complexity of the sorting algorithm, i.e. O(n*log n).






              share|improve this answer






























                2














                In case order matters you can presort the lists together with item indices and then iterate over them together:



                list_2 = sorted(list_2)
                diff_idx =
                j = 0
                for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
                if x != list_2[j]:
                diff_idx.append(i)
                else:
                j += 1
                diff = [list_1[i] for i in sorted(diff_idx)]


                This has time complexity of the sorting algorithm, i.e. O(n*log n).






                share|improve this answer




























                  2












                  2








                  2







                  In case order matters you can presort the lists together with item indices and then iterate over them together:



                  list_2 = sorted(list_2)
                  diff_idx =
                  j = 0
                  for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
                  if x != list_2[j]:
                  diff_idx.append(i)
                  else:
                  j += 1
                  diff = [list_1[i] for i in sorted(diff_idx)]


                  This has time complexity of the sorting algorithm, i.e. O(n*log n).






                  share|improve this answer















                  In case order matters you can presort the lists together with item indices and then iterate over them together:



                  list_2 = sorted(list_2)
                  diff_idx =
                  j = 0
                  for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
                  if x != list_2[j]:
                  diff_idx.append(i)
                  else:
                  j += 1
                  diff = [list_1[i] for i in sorted(diff_idx)]


                  This has time complexity of the sorting algorithm, i.e. O(n*log n).







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Jan 10 at 13:04

























                  answered Jan 10 at 12:57









                  a_guesta_guest

                  5,87321241




                  5,87321241






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54128876%2fwhat-is-the-most-efficient-way-to-compute-the-difference-of-lines-from-two-files%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Mario Kart Wii

                      The Binding of Isaac: Rebirth/Afterbirth

                      What does “Dominus providebit” mean?